Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite the text classification demo. #83

Merged
merged 2 commits into from
Jun 14, 2017

Conversation

lcy-seso
Copy link
Collaborator

rewrite the text classification example.

@lcy-seso lcy-seso requested a review from llxxxll June 10, 2017 06:26
@lcy-seso lcy-seso force-pushed the rewrite_text_classification branch 21 times, most recently from eeccb28 to f884a61 Compare June 12, 2017 07:59
@@ -51,10 +37,10 @@ def main():
learning_rate_schedule="discexp", )

train_reader = paddle.batch(
paddle.reader.shuffle(reader.test_reader("train.list"), buf_size=1000),
paddle.reader.shuffle(reader.train_reader("train.list"), buf_size=1000),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

train.list是文本数据集吗?没有在目录下找到

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这处修改用来修改图像分类例子的bug,目前每个例子读取数据的方式确实不统一。后续提PR修改图像分类的例子。

batch_size=BATCH_SIZE)
test_reader = paddle.batch(
reader.train_reader("test.list"), batch_size=BATCH_SIZE)
reader.test_reader("test.list"), batch_size=BATCH_SIZE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test.list是文本数据集吗?没有在目录下找到

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这处修改用来修改图像分类例子的bug,目前每个例子读取数据的方式确实不统一。后续提PR修改图像分类的例子。

├── run.sh # 运行此脚本,可以以默认参数直接开始训练任务
├── train.py # 训练任务脚本
└── utils.py # 定义通用的函数,例如:打印日志、解析命令行参数、构建字典、加载字典等
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

缺少一个快速开始。其中包含上面的目录说明,及一个训练说明(训练过程输出日志里都是什么意思)、和一个预测说明(可以是一段code,参考:https://www.oschina.net/p/jieba/?fromerr=btIKdxHH -功能 1)分词的代码及output、当然也可以是一个gif图)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

接下来应该是之前写在末尾的『修改参数说明』,这样可以在新手运行出一个结果的前提下,来对比修改不同参数得到的结果不同。于是顺势引出CNN&DNN该如何选择

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其实余下部分都是在解释这些『参数』的含义,顺序也应如此。

cost = paddle.layer.classification_cost(input=output, label=lbl)

return cost, output, lbl
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果是面向初学者,数据格式的自定义要比其他信息重要得多,因为这可能是唯一"需要考虑"的事,所以优先级需要提到快速开始后面。而详解的这个位置可以对应加一个锚点链接。

#!/bin/sh

python train.py \
--nn_type="dnn" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的train的方式怎么又改成shell传参了,按照约定都应该写到train.py里?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shell 里面为 train.py 指定的参数,直接运行shell 即可,否则需要在命令行敲长串的参数。



def train(topology,
train_data_dir=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lm_rnn.py 里有个run_type=GRU #'or LSTM',在这里是否可以增加一个方法的选择参数比如DNN OR CNN

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已有此参数,nn_type 用来指定选择何种模型。

@lcy-seso lcy-seso force-pushed the rewrite_text_classification branch from f884a61 to b1231a5 Compare June 13, 2017 09:07
@lcy-seso lcy-seso force-pushed the rewrite_text_classification branch from b1231a5 to 501ce21 Compare June 13, 2017 09:13
Copy link
Collaborator Author

@lcy-seso lcy-seso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow comments.

batch_size=BATCH_SIZE)
test_reader = paddle.batch(
reader.train_reader("test.list"), batch_size=BATCH_SIZE)
reader.test_reader("test.list"), batch_size=BATCH_SIZE)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这处修改用来修改图像分类例子的bug,目前每个例子读取数据的方式确实不统一。后续提PR修改图像分类的例子。

@@ -51,10 +37,10 @@ def main():
learning_rate_schedule="discexp", )

train_reader = paddle.batch(
paddle.reader.shuffle(reader.test_reader("train.list"), buf_size=1000),
paddle.reader.shuffle(reader.train_reader("train.list"), buf_size=1000),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这处修改用来修改图像分类例子的bug,目前每个例子读取数据的方式确实不统一。后续提PR修改图像分类的例子。

@lcy-seso lcy-seso force-pushed the rewrite_text_classification branch from 6068a69 to a0529eb Compare June 13, 2017 10:18
@luotao1
Copy link
Contributor

luotao1 commented Jun 13, 2017

Readme的目录顺序需要调整么?把模型介绍、模型详解放到最开头的地方?

@lcy-seso lcy-seso force-pushed the rewrite_text_classification branch 4 times, most recently from 74980fe to ba69ba5 Compare June 13, 2017 11:07
@lcy-seso lcy-seso force-pushed the rewrite_text_classification branch from ba69ba5 to 136a60d Compare June 13, 2017 11:10
@lcy-seso lcy-seso merged commit f27154e into PaddlePaddle:develop Jun 14, 2017
@lcy-seso lcy-seso deleted the rewrite_text_classification branch June 16, 2017 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants